-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support SELECT/INSERT/UPDATE/DELETE db operation + span names #1253
base: main
Are you sure you want to change the base?
Conversation
8f0e456
to
5340675
Compare
@@ -8,6 +8,8 @@ import ( | |||
"os" | |||
"strconv" | |||
|
|||
sql "github.com/xwb1989/sqlparser" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked through a couple sql parsing libraries, this isn't the most maintained... https://github.com/krasun/gosqlparser seems good but for some reason I don't think it supports *
in queries (??). For what we're doing now I think it's fine but this could use another look some day
89e64c7
to
7116a06
Compare
c809878
to
97ccfaf
Compare
|
||
if operation != "" { | ||
span.Attributes().PutStr(string(semconv.DBOperationNameKey), operation) | ||
span.SetName(operation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we try to build the {db.query.summary}
or {db.operation.name} {target}
here given we have already parsed the query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was originally doing db.query.summary
, but we don't have any of the values for {target}
(db.collection, db.namespace, or server address). So, it recommends just using the operation:
If a corresponding
{target}
value is not available for a specific operation, the instrumentation SHOULD omit the{target}
. For example, for an operation describing SQL query on an anonymous table likeSELECT * FROM (SELECT * FROM table) t
, span name should beSELECT
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The table might be defined in the query statement though, right? The query parsing library should be able to get this, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that I could find, unfortunately :\ The first library I tried (https://github.com/krasun/gosqlparser) does have that, which is how I was building it at first. But that library breaks when it tries to parse a *
character (like SELECT * FROM
). Which makes no sense to me, but that's my understanding from trying to debug it and read that library's code. I'm open to add a TODO here for this though because maybe I'm just missing something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should work, right?
package main
import (
"fmt"
"github.com/xwb1989/sqlparser"
)
// Parse takes a SQL query string and returns the parsed query statement type
// and table name, or an error if parsing failed.
func Parse(query string) (string, string, error) {
stmt, err := sqlparser.Parse(query)
if err != nil {
return "", "", fmt.Errorf("failed to parse query: %w", err)
}
switch stmt := stmt.(type) {
case *sqlparser.Select:
return "SELECT", getTableName(stmt.From), nil
case *sqlparser.Update:
return "UPDATE", getTableName(stmt.TableExprs), nil
case *sqlparser.Insert:
return "INSERT", stmt.Table.Name.String(), nil
case *sqlparser.Delete:
return "DELETE", getTableName(stmt.TableExprs), nil
default:
return "UNKNOWN", "", fmt.Errorf("unsupported statement type")
}
}
// getTableName extracts the table name from a SQL node.
func getTableName(node sqlparser.SQLNode) string {
switch tableExpr := node.(type) {
case sqlparser.TableName:
return tableExpr.Name.String()
case sqlparser.TableExprs:
for _, expr := range tableExpr {
if tableName, ok := expr.(*sqlparser.AliasedTableExpr); ok {
if name, ok := tableName.Expr.(sqlparser.TableName); ok {
return name.Name.String()
}
}
}
}
return ""
}
func main() {
queries := []string{
"SELECT * FROM users WHERE id = 1",
"SELECT id, name FROM users WHERE id = 1",
"INSERT INTO orders (id, amount) VALUES (1, 100)",
"UPDATE products SET price = 19.99 WHERE id = 10",
"DELETE FROM sessions WHERE expired = true",
"CREATE TABLE logs (id INT, message TEXT)",
}
for _, query := range queries {
fmt.Println("Query: ", query)
statement, table, err := Parse(query)
if err != nil {
fmt.Println("Error:", err)
continue
}
fmt.Printf("Statement: %s, Table: %s\n", statement, table)
}
}
$ go run .
Query: SELECT * FROM users WHERE id = 1
Statement: SELECT, Table: users
Query: SELECT id, name FROM users WHERE id = 1
Statement: SELECT, Table: users
Query: INSERT INTO orders (id, amount) VALUES (1, 100)
Statement: INSERT, Table: orders
Query: UPDATE products SET price = 19.99 WHERE id = 10
Statement: UPDATE, Table: products
Query: DELETE FROM sessions WHERE expired = true
Statement: DELETE, Table: sessions
Query: CREATE TABLE logs (id INT, message TEXT)
Error: unsupported statement type
FWIW, looking at vitess, it seems like it should be able to support more statement types in the table lookup (i.e. table create).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using "vitess.io/vitess/go/vt/sqlparser"
:
package main
import (
"fmt"
"vitess.io/vitess/go/vt/sqlparser"
)
// Parse takes a SQL query string and returns the parsed query statement type
// and table name(s), or an error if parsing failed.
func Parse(query string) (string, []string, error) {
p, err := sqlparser.New(sqlparser.Options{})
if err != nil {
return "", nil, fmt.Errorf("failed to create parser: %w", err)
}
stmt, err := p.Parse(query)
if err != nil {
return "", nil, fmt.Errorf("failed to parse query: %w", err)
}
var statementType string
var tables []string
switch stmt := stmt.(type) {
case *sqlparser.Select:
statementType = "SELECT"
tables = extractTables(stmt.From)
case *sqlparser.Update:
statementType = "UPDATE"
tables = extractTables(stmt.TableExprs)
case *sqlparser.Insert:
statementType = "INSERT"
tables = []string{stmt.Table.TableNameString()}
case *sqlparser.Delete:
statementType = "DELETE"
tables = extractTables(stmt.TableExprs)
case *sqlparser.CreateTable:
statementType = "CREATE TABLE"
tables = []string{stmt.Table.Name.String()}
case *sqlparser.AlterTable:
statementType = "ALTER TABLE"
tables = []string{stmt.Table.Name.String()}
case *sqlparser.DropTable:
statementType = "DROP TABLE"
for _, table := range stmt.FromTables {
tables = append(tables, table.Name.String())
}
case *sqlparser.CreateDatabase:
statementType = "CREATE DATABASE"
tables = []string{stmt.DBName.String()}
case *sqlparser.DropDatabase:
statementType = "DROP DATABASE"
tables = []string{stmt.DBName.String()}
case *sqlparser.TruncateTable:
statementType = "TRUNCATE TABLE"
tables = []string{stmt.Table.Name.String()}
default:
return "UNKNOWN", nil, fmt.Errorf("unsupported statement type")
}
return statementType, tables, nil
}
// extractTables extracts table names from a list of SQL nodes.
func extractTables(exprs sqlparser.TableExprs) []string {
var tables []string
for _, expr := range exprs {
switch tableExpr := expr.(type) {
case *sqlparser.AliasedTableExpr:
if name, ok := tableExpr.Expr.(sqlparser.TableName); ok {
tables = append(tables, name.Name.String())
}
}
}
return tables
}
func main() {
queries := []string{
"SELECT * FROM users WHERE id = 1",
"SELECT id, name FROM users WHERE id = 1",
"INSERT INTO users (id, name) VALUES (1, 'Alice')",
"UPDATE users SET name = 'Bob' WHERE id = 1",
"DELETE FROM users WHERE id = 1",
"CREATE TABLE users (id INT, name VARCHAR(100))",
"ALTER TABLE users ADD COLUMN age INT",
"DROP TABLE users",
"CREATE DATABASE test_db",
"DROP DATABASE test_db",
"TRUNCATE TABLE users",
}
for _, query := range queries {
fmt.Println("Query: ", query)
statement, tables, err := Parse(query)
if err != nil {
fmt.Printf("Error parsing query: %s\nQuery: %s\n\n", err, query)
continue
}
fmt.Printf("Statement: %s, Tables/DBs: %v\n", statement, tables)
}
}
$ go run .
Query: SELECT * FROM users WHERE id = 1
Statement: SELECT, Tables/DBs: [users]
Query: SELECT id, name FROM users WHERE id = 1
Statement: SELECT, Tables/DBs: [users]
Query: INSERT INTO users (id, name) VALUES (1, 'Alice')
Statement: INSERT, Tables/DBs: [users]
Query: UPDATE users SET name = 'Bob' WHERE id = 1
Statement: UPDATE, Tables/DBs: [users]
Query: DELETE FROM users WHERE id = 1
Statement: DELETE, Tables/DBs: [users]
Query: CREATE TABLE users (id INT, name VARCHAR(100))
Statement: CREATE TABLE, Tables/DBs: [users]
Query: ALTER TABLE users ADD COLUMN age INT
Statement: ALTER TABLE, Tables/DBs: [users]
Query: DROP TABLE users
Statement: DROP TABLE, Tables/DBs: [users]
Query: CREATE DATABASE test_db
Statement: CREATE DATABASE, Tables/DBs: [test_db]
Query: DROP DATABASE test_db
Statement: DROP DATABASE, Tables/DBs: [test_db]
Query: TRUNCATE TABLE users
Statement: TRUNCATE TABLE, Tables/DBs: [users]
@@ -27,6 +29,9 @@ const ( | |||
|
|||
// IncludeDBStatementEnvVar is the environment variable to opt-in for sql query inclusion in the trace. | |||
IncludeDBStatementEnvVar = "OTEL_GO_AUTO_INCLUDE_DB_STATEMENT" | |||
|
|||
// IncludeDBOperationEnvVar is the environment variable to opt-in for sql query operation in the trace. | |||
IncludeDBOperationEnvVar = "OTEL_GO_AUTO_INCLUDE_DB_OPERATION" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call this OTEL_GO_AUTO_PARSE_DB_STATEMENT
instead? Give we are going to use the parsed semantics for more than just the operation it might help better convey the intention.
@@ -22,6 +22,7 @@ require ( | |||
github.com/hashicorp/go-version v1.7.0 | |||
github.com/pkg/errors v0.9.1 | |||
github.com/stretchr/testify v1.9.0 | |||
github.com/xwb1989/sqlparser v0.0.0-20180606152119-120387863bf2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we instead use the code this was based on: https://github.com/vitessio/vitess/tree/main/go/vt/sqlparser
It seems like this was a fork of that repository1, but it hasn't been maintained or synced since that fork.
Would it be too large of a dependency to just rely on vitess directly?
Footnotes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did think about using vitess directly, but it seemed like a lot more than what we needed. Since this is just an implementation detail, we could refactor it later if we want but I think this is the best option for right now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned when looking through all the changes that have been applied to the upstream code over the past 6 years, there's a considerable amount of changes. Notably:
- There are many additional statement types that can be parsed: https://pkg.go.dev/vitess.io/vitess/go/vt/sqlparser#StatementType
- The statement type itself can print its name meaning we will not need to add an additional switch statement: https://pkg.go.dev/vitess.io/vitess/go/vt/sqlparser#StatementType.String
- The code is not owned by Google, it is licensed by a CNCF project
- Make Parser Memory Usage More Efficient vitessio/vitess#4174
- Misc fixes vitessio/vitess#2395 which addressed:
- [parser] use table_alias for ENGINE option in CREATE TABLE stmt vitessio/vitess#8307
Should we make our own fork of this pacakge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we maybe get a better understanding of dependency size by looking at what the binary size is based on the possible dependencies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll admit it looks like you've dug into it deeper than I did, hah. These are some great points so maybe that is the better approach. I'll try switching it to that
Ref #743
From the semconv for db spans:
We don't have
collection
(table),namespace
, orserver.address
readily available (right now), but we can at least parse certain operations. We can also support more over time. So this gets us at least a couple useful span names besidesDB
(fallback for unsupported operations)This means we can also set
db.operation.name
with that same value when it is available.